Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2023 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-36909524

RESUMO

Advances in gene delivery technologies are enabling rapid progress in molecular medicine, but require precise expression of genetic cargo in desired cell types, which is predominantly achieved via a regulatory DNA sequence called a promoter; however, only a handful of cell type-specific promoters are known. Efficiently designing compact promoter sequences with a high density of regulatory information by leveraging machine learning models would therefore be broadly impactful for fundamental research and direct therapeutic applications. However, models of expression from such compact promoter sequences are lacking, despite the recent success of deep learning in modelling expression from endogenous regulatory sequences. Despite the lack of large datasets measuring promoter-driven expression in many cell types, data from a few well-studied cell types or from endogenous gene expression may provide relevant information for transfer learning, which has not yet been explored in this setting. Here, we evaluate a variety of pretraining tasks and transfer strategies for modelling cell type-specific expression from compact promoters and demonstrate the effectiveness of pretraining on existing promoter-driven expression datasets from other cell types. Our approach is broadly applicable for modelling promoter-driven expression in any data-limited cell type of interest, and will enable the use of model-based optimization techniques for promoter design for gene delivery applications. Our code and data are available at https://github.com/anikethjr/promoter_models.

2.
PLoS Comput Biol ; 18(6): e1010238, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35767567

RESUMO

A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemical properties. Here, we introduce a proteome-scale feature discovery approach for IDRs. Our approach, which we call "reverse homology", exploits the principle that important functional features are conserved over evolution. We use this as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a held-out homolog from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and standard interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, or bulk features like charge or amino acid propensities. We also show that our model can be used to produce visualizations of what residues and regions are most important to IDR function, generating hypotheses for uncharacterized IDRs. Our results suggest that feature discovery using unsupervised neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences.


Assuntos
Proteínas Intrinsicamente Desordenadas , Proteoma , Sequência de Aminoácidos , Evolução Molecular , Proteínas Intrinsicamente Desordenadas/química , Conformação Proteica , Proteoma/metabolismo
3.
Curr Protoc ; 1(5): e113, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-33961736

RESUMO

Models from machine learning (ML) or artificial intelligence (AI) increasingly assist in guiding experimental design and decision making in molecular biology and medicine. Recently, Language Models (LMs) have been adapted from Natural Language Processing (NLP) to encode the implicit language written in protein sequences. Protein LMs show enormous potential in generating descriptive representations (embeddings) for proteins from just their sequences, in a fraction of the time with respect to previous approaches, yet with comparable or improved predictive ability. Researchers have trained a variety of protein LMs that are likely to illuminate different angles of the protein language. By leveraging the bio_embeddings pipeline and modules, simple and reproducible workflows can be laid out to generate protein embeddings and rich visualizations. Embeddings can then be leveraged as input features through machine learning libraries to develop methods predicting particular aspects of protein function and structure. Beyond the workflows included here, embeddings have been leveraged as proxies to traditional homology-based inference and even to align similar protein sequences. A wealth of possibilities remain for researchers to harness through the tools provided in the following protocols. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. The following protocols are included in this manuscript: Basic Protocol 1: Generic use of the bio_embeddings pipeline to plot protein sequences and annotations Basic Protocol 2: Generate embeddings from protein sequences using the bio_embeddings pipeline Basic Protocol 3: Overlay sequence annotations onto a protein space visualization Basic Protocol 4: Train a machine learning classifier on protein embeddings Alternate Protocol 1: Generate 3D instead of 2D visualizations Alternate Protocol 2: Visualize protein solubility instead of protein subcellular localization Support Protocol: Join embedding generation and sequence space visualization in a pipeline.


Assuntos
Inteligência Artificial , Aprendizado Profundo , Aprendizado de Máquina , Processamento de Linguagem Natural , Proteínas
4.
BMJ Open ; 8(6): e019110, 2018 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-29961001

RESUMO

OBJECTIVE: To characterise the early diffusion of indirect comparison meta-analytic methods to study drugs. DESIGN: Systematic literature synthesis. DATA SOURCES: Cochrane Database of Systematic Reviews, EMBASE, MEDLINE, Scopus and Web of Science. STUDY SELECTION: English language papers that used indirect comparison meta-analytic methods to study the efficacy or safety of three or more interventions, where at least one was a drug. DATA EXTRACTION: The number of publications and authors was plotted by year and type: methodological contribution, review or empirical application. Author and methodological details were summarised for empirical applications, and animated coauthorship networks were created to visualise contributors by country and affiliation type (academia, industry, government or other) over time. RESULTS: We identified 477 papers (74 methodological contributions, 42 reviews and 361 empirical applications) by 1689 distinct authors from 1997 to 2013. Prior to 2002, only three applications were published, with contributions from the USA (n=2) and Canada (n=1). The number of applications gradually increased annually with rapid uptake between 2011 and 2013 (n=254, 71%). Early diffusion occurred primarily in Europe with the first application credited to the UK in 2003. Application spread to other European countries in 2005, and may have been supported by regulatory requirements for drug approval. By the end of 2013, contributions included 49% credited to Europe (22% UK, 27% other), 37% credited to North America (11% Canada, 26% USA) and 14% from other regions. CONCLUSION: Indirect comparison meta-analytic methods are an important innovation for health research. Although Canada and the USA were the first to apply these methods, Europe led their diffusion. The increase in uptake of these methods may have been facilitated by acceptance by regulatory agencies, which are calling for more comparative drug effect data to assist in drug accessibility and reimbursement decisions.


Assuntos
Avaliação de Medicamentos , Metanálise como Assunto , Publicações/estatística & dados numéricos , Publicações/tendências , Autoria , Europa (Continente) , Humanos , América do Norte , Ensaios Clínicos Controlados Aleatórios como Assunto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...